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. Abstract 

We investigate the construction of prefix-free and fix-free codes with specified codeword com- 
positions. We present a polynomial time algorithm which constructs a fix-free code with the same 
codeword compositions as a given code for a special class of codes called distinct codes. We consider 
the construction of optimal fix-free codes which minimize the average codeword cost for general letter 
costs with uniform distribution of the codewords and present an approximation algorithm to find a near 
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r*** \ optimal fix-free code with a given constant cost. 
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I. Introduction 



The basic elements of a discrete communication system are its source, encoder, channel, 
decoder and destination. The source may be represented as a random variable, X, taking on 
values from the set of source characters {xi,x 2 , ■■■ ,x M } with probabilities p\,P2,-" ,Pm, 
respectively. A message is a sequence of source characters. To facilitate transmission, the encoder 
associates with every source character, x i5 a finite sequence of code characters ai,a 2 , • • • , ao 
(D-ary). Such a sequence of code characters is called a codeword. A code, denoted by S, is the 
collection of all codewords. The encoded message is then transmitted over the channel which 
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we assume to be noiseless. At the receiving end, the decoder attempts to reproduce the original 
message by assigning a set of source characters to the coded message. 

To avoid ambiguity, every finite sequence of code characters must correspond to no more than 
one message. A code that conforms with this requirement is said to be a uniquely decodable 
code. Furthermore, to simplify the decoding procedure, two other type of codes are often used in 
communication systems defined as follows. If no codeword is a prefix to some other codeword, 
the code is said to be a prefix-free code, and if no codeword is a prefix or suffix to some 
other codeword, the code is said to be & fix-free code. We denote the set of all codes, uniquely 
decodable codes, prefix-free codes and fix-free codes, that can be constructed from the code 
character {ax,a 2 - ■■ ,a D }, by C D ,C® d , C® f and Cf f , respectively. Along the paper, superscript 
D is omitted for binary codes. In general, directly from definitions, it can be deduced that 
C D D C® d D 2> Cf f . We illustrate it with the following example. 

Example 1. Consider the following four binary codes, 

S t = {00,10,11} 
S 2 = {00,10,11,011} 
S 3 = {00,10,11,110,100} 
S 4 = {0,001,100,110}. 

Si is a fix-free code (Si E Cff), S 2 is a prefix-free code but is not fix-free (S 2 E C p f, S 2 4- Sff)> ^3 
is a uniquely decodable code but is neither prefix-free nor fix-free (S3 E C u d, S3 £ C p f, S3 £ Cff) 
but C 4 is neither uniquely decodable, prefix-free nor fix-free (S4 E S, S4 ^ C u a, S4 ^ C p f, S4 ^ 
Cff)- 

Let S = {si, s 2 , ■ ■ ■ , s n } be a code. The composition of a codeword Sk,k = 1, 2, ■ ■ • , n, is 
written as (5[ k \ 8 2 , ■ ■ ■ , 8^) where 5^ is the number of times the code character appears 
in the codeword Suppose that a set of costs {ci, c 2 , • • • , cp} associated with the respective 
code characters {ai, a 2 , ■ ■ • , a^}, i.e. c, is positive corresponding to a*, i — 1, 2, ■ ■ • , D, then the 
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average codeword cost of the code S is equal to 



k=i 



i) 



.i=i 



(1) 



where is the probability assigned to Sk,k = 1,2, 



n. 



Example 2. 77ze message a(3 paKa8a(3 paaap can be considered to be a 6-ary message over 
the alphabet {a, (3, k, 5, x, p}- Its length is 14, and its composition vector is (7,2,1,1,0,3). 
Assuming respective symbol costs (1,3,3,2,10,1) then the cost is 21. 

It is known that for equal costs, i.e., c\ — c 2 — ■ ■ ■ — c D , Huffman's algorithm, @|, derives 
an optimal prefix-free code, but when the costs Ci, c 2 , • • • , cp are not all equal, the composition 
of the codewords becomes important. The problem of constructing optimal code for minimizing 
the average cost has been considered for prefix-free codes in HI, J3]|, [8]|. Constructing optimal 
fix-free codes with the aim of minimizing the average code length, equal letter costs, is recently 
considered in 0. Upper bounds on the average code length of optimal fix-free codes which 
minimize the average code length for equal letter cost, but general probability distributions 
of the alphabet symbols are provided in O, (in contrast, in this work, we consider the 
construction of optimal fix-free codes which minimize the average codeword cost for general 
letter costs with uniform distribution of the codewords). 

As mentioned in above, when costs are unequal then the composition of the codewords plays an 
important role in constructing optimal codes. In this paper, we provide a necessary and sufficient 
condition for the existence of a D-ary prefix-free code with a given set of compositions (this is 
an immediate extension of Proposition 2 of [|2) to D-ary codes) and then we present a polynomial 
algorithm that results in a binary prefix-free code with the same composition set of a given code. 
We also present an algorithm to find a fix-free code for a given set of compositions of a special 
class of codes that we call distinct codes, if such a fix-free code exists. Consequently, we present 
an approximation algorithm to find a near optimal fix-free code with a given constant cost. All 
the results refer to binary codes. 
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II. Prefix-free codes 

In the following, we present a necessary and sufficient condition for existence of a D-ary 
prefix-free code with a given set of codeword compositions which is an immediate extension of 
Proposition 2 of to D-ary codes. Then, we establish a polynomial time algorithm to find a 
binary prefix-free code with a given composition set. 

Theorem 1 ( 0). Let A = {(S? \ <5? \ ■ • • ,6$), 1 < k < n} be the set of codeword 
compositions of some code S (with n codewords). Then there exists a prefix-free code with 
the same set of codeword compositions if and only if the following inequality holds for each 
(S[ k \ ■ ■ ■ , <5# ) G A, (length of any codeword Sk € S,l < k < n, is denoted by Ik, i.e. 

n( y j^e e -w n( 1 ^ J ) 

— i 

where A ( fe ) ( fe ) (» z's ?/ze number of codewords of composition (£1 , £2, ' ' ' >£d ) jn 

Proof: The number of all codewords of composition (S[ k \ ■ ■ ■ , 6$) is Yli^ (^ J= « J ) ■ 
In addition, it is clear that, the number of words of composition [S[ k \ ■ ■ ■ , 8^ ) with a prefix 
code of composition (g[ k \ Q k \ ■ ■■ , £^) is ( r*/ )• Therefore, the necessity of 

the theorem is resulted when the number of all codewords of composition (5[ k \ 5%, ■ ■ ■ , 5$) 
is greater than the number of codewords of composition (^[ k \ ffi, ■ ■ ■ , £^ ) which must be 
removed by the prefix condition. 



To prove the sufficiency of the theorem, we construct a prefix code with the given composition 
by an algorithm. We start from shorter codewords, at each iteration if we need A w m) x q>) 
codewords of composition (o^, o^, • ' ' ,5$), from the composition inequality there are at 
least A.(fe) jk) x (k) codewords with composition (5[ k \ ■ ■ ■ ,6$) such that all of them do 
not have a prefix in the previous set of codewords. Hence, the constructed code is a prefix code 
with composition set A. ■ 

Example 3. Let A := {(2, 0), (1, 1), (3, 1)} (where (a, b) represents the composition of a code- 
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word with a zeros and b ones) from Theorem \T\ the existence of a binary prefix code with 
composition set A is guaranteed because, 



=1 

For example, {00, 01, 1000} is a binary prefix code with composition set A. Now, suppose that 
one more composition (1,1) is also added to A, so define A' := {(2, 0), (1, 1), (1, 1), (3, 1)} 
then, there is not any binary prefix code with composition set A 1 because 



(T)^M:)+6.cr)+<r)- 

=2 

From now on all the results are presented for binary codes. In the following theorem we 
present a polynomial algorithm to find a prefix-free code with the same composition set as a 
given code S, if such a prefix-free code exists. 

Definition 1. For any word s and two numbers a and b, f s , a ,b is equal to the number of codewords 
such as s' with a zeros and b ones such that s is a prefix of s'. 

Theorem 2. For any code S = {s±, S2, ■ ■ ■ , s n }, there is a polynomial tim$\ algorithm which 
finds a prefix-free code with the same composition set as the given code S, if there exists such 
a prefix-free code. 

Proof: Without loss of generality, suppose |si| < \s 2 \ < • • • < \s n \, where \w\ is the length 
of w. Our algorithm has n iterations. In the zth iteration, we find a string s[ such that the 
composition of s' { is the same as the composition of Sj and s'j is not a prefix of for any j < i, 
as follows. After nth iteration we reach the desired code 5' = {s[, s' 2 , . . . , s' n } with the same 

'in terms of n and the sum of the lengths of the n codewords. 
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composition set as the code S, and furthermore it is a prefix-free code. 

Let a and b be the number of zeros and ones in Sj, respectively. If S* =1 / Sji a,fe > ( a „ b )> then 
there is not a code such as S' with the desired properties. Otherwise, there is a string such as 
with the mentioned conditions. We can find the smallest string such as s'- in polynomial time as 
follows. We iteratively find the bits/digits (code character in binary case) of s^. For any string 
such as x we can check whether there is a string such as y with the same composition set as Sj 
such that x is a prefix of y and Sj is not a prefix of y for any j < i. Existence of such a string 
is equivalent to this property that the sum of f Zja -c,b-d for all codewords such as z for which 
Sj = xz, for some j < i (the notation xz is a concatenation of two codewords x and z) is less 
than all the codewords such as w with a — c zeros and b — d ones (c and d are the number of 
zeros and ones in x, respectively). Now, for finding the smallest s[, we check whether there is a 
s[ which starts with 0. If there is such a string, we set the first bit of s- zero. Otherwise, we set 
it one. Suppose we have set the first I bits of s ■ and we want to set the I + 1th bit. We construct 
the string x by concatenating these / bits. We check whether there is a string such as y such 
that its composition is the same as Sj and xO is a prefix of y and Sj is not a prefix of y for any 
j < i. If there exists such a string then the / + 1th bit is zero. Otherwise, the I + 1th bit is one. 
After | Si | iterations we find the desired s-. 

If there exists a code S' which its composition set is the same as the composition set of the 
code S and S' is prefix-free, iteratively as explained in the above, we can find it. Note that our 
algorithm has n iterations, and in each of these iterations we are computing the sum of at most 
n values of function /. All these operations can be done in time polynomial of n and the sum 
of the lengths of the codewords. ■ 

III. Fix-free codes 

In Theorem [61 we introduce a sufficient condition under which for a class of codes that we 
call distinct codes, there exists a fix-free code with the same composition set as the composition 
set of a given code. 

Definition 2. A code S = {si, s 2 , . . . , s n } is distinct if for any 1 < i, j < n, aj and aj, satisfy 
one of the following properties (df. is the length of the codeword Sk for any k — 1, 2, • • • , n) : 

9 Qj 7 
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• 2a, < cij 

• 2a,j < Oj 

means that if any two codewords Sj and Jo no? /zave ?/ze same size, the size of one of 
them should be at least twice the size of the other one. 

In the following sequence of lemmas, we present some combinatorial facts that we refer to 
them along the proof of Theorem [61 

Lemma 3. For a string s with c ones and d zeros, the number of strings which have a ones and 
b zeros, and s is a prefix of them is equal to (°+ - c ~ d ) ; i.e f s , a ,b — ( a+ a~c d )- 

Lemma 4. For a string s with c ones and d zeros, the number of strings which have a ones and 
b zeros, and s is a suffix of them is also equal to ( a+ aZ c c ~ d )- 

Lemma 5. For any two strings si with c ones and d zeros and s 2 with e ones and f zeros, the 
number of strings which have a ones and b zeros, and s\ is a prefix of them, and also s 2 is a 
suffix of them, is equal to ( a+b ~^~l~ e ) if we know that a > c + e and b > d + f. 

Proof: Let s' be one of these strings. We also know that a + b > c + d + e + f. The first 
c + d letters of s' are fixed because si is a prefix of s'. The last e + / letters of s' are also fixed 
because s 2 is a suffix of s'. It remained to count the number of ways we can fix the rest of the 
letters of s' such that s' has a ones and b zeros. Note that s' already has c + e ones, and d + / 
zeros. So we have to put a — (c + e) ones, and b — (d + f) zeros in the rest of the letters (the 
unfixed letters). This can be done in { a ~ {c+ ^Z+J) d+f) ) = ( a+6 ;^l7 _/ ) ways. ■ 
In [6] it is shown that for any distinct code S = {si,s 2 , ••• ,s n } satisfying the inequality 
Y^=i 2"'^' < 3/4, there is a binary fix-free code with the same codeword lengths. In the 
following, we present a polynomial time algorithm which finds a fix-free code with the same 
set of composition codewords as the given code S, if there exists such a code. 

Theorem 6. For any distinct code S with n codewords Si, s 2 , ■ ■ ■ , s n , there is a polynomial time 
algorithm which finds a fix-free code with the same set of composition codewords as the given 
code S, if there exists such a code. 

Proof: Without loss of generality, suppose a\ < a 2 < ■ • ■ < a n , where a* is the length of Sj, 
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i = 1, 2, • • • , n. Our algorithm has n iterations. In the ith iteration, we find a string s[ such that 
composition set of s ■ is same to composition set of Sj and s'j is neither a prefix of s[ nor a suffix 
of it for any j < i, as follows. After nth iteration we reach the desired code S' = {s[, s' 2 , . . . , s' n } 
such that its composition set is as same as code S and is fix-free. Let a and b be the number of 
zeros and ones in Sj respectively. Now, we want to count the number of strings with a ones and 
b zeros which are neither a prefix nor a suffix of any of the strings s[, s' 2 , ■ ■ ■ , s^_ x . Note that 
we can calculate this number only with knowing the fact that the composition set of each s'j is 
exactly the one of Sj, j < i. This means that this number depends only on the number of ones 
and zeros of the previous strings. Now we derive the number as follows. The number of strings 
with a ones and b zeros is equal to ( a ~^ b ) . We decrease the number of strings which have a ones 
and b zeros, and s'j is a prefix of them. We do this decreasing process for any j < i. We also 
decrease the number of strings which have a ones and b zeros, and s'j is a suffix of them. Again 
we do this decreasing process for any j < i. According to the fact that we know the numbers of 
ones and zeros of s'j and using Lemmas |3] and we can calculate these numbers. Now, note that 
some strings might be decreased twice. For example for a string s we might have that s'j is its 
prefix and also s' k is its suffix for some j, k < i. But there is no string such as s that two strings 
such as s'j and s' k are its prefix at the same time, because it means that one of these two strings 
is a prefix of another which contradicts the fact that none of the strings s[, s' 2 , ■ ■ ■ , s^_ x is a 
prefix or suffix of another. We can also conclude that there is no string such as s that two strings 
such as s'j and s' k are its suffix at the same time. Therefore we just need to add the number of 
strings with a ones and b zeros that s'j is its prefix, and s' k is also its suffix for any pair of j, k 
where 1 < j, k < i. Now for calculating the number of strings which have a ones and b zeros, 
and s'j is their prefix, and s' k is their suffix, we have two cases. At first, we suppose that one 
of these two strings, s'j and s' k , has the same length of s^. Without loss of generality suppose 
dj = aj. Now we assert that there is no string such as s that s'j is its prefix and s' k is its suffix. 
Otherwise, according to the fact that the length of s'j is equal to a + b which is the length of Sj 
and s, we conclude that s is equal to s'j. We also know that s' k is a suffix of s and also is a suffix 
of s'j which contradicts the fact that none of the strings s[, s' 2 , ■ ■ ■ , is a prefix or suffix of 
another. Therefore there is no such string and our desired number is zero. The other case occurs 
when the length of both s'j and s' k are strictly less than the length of Sj. Using the fact that our 
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code is distinct we conclude that 2\s'j\ < a + b and 2\s' k \ < a + b, so we have |s^-| + \s' k \ < a + b. 
Now we can apply Lemma |5l and calculate our desired number. According to the Inclusion and 
Exclusion principle we should continue this process of decreasing and increasing iteratively, but 
actually we do not need to do it anymore, because there is not any string such as s such that 
three strings like s'j, s' k and s\ are either its prefix or its suffix. The reason is somehow clear, 
because if there were the three strings s'j, s' k and s[ which are either a prefix or a suffix of s, 
then according to the pigeonhole principle two of them should be a prefix of s, or two of them 
should be a suffix of s. In the former case, we see that one of the strings s'j, s' k and s' t is a 
prefix of another, and in the latter case, we see that one of the strings s'j, s' k and s\ is a suffix of 
another. But this again contradicts the fact that none of the strings s' x , s' 2 , ■ ■ ■ , is a prefix or 
suffix of another. So, using this algorithm, we can iteratively count the number of choices we 
have to replace with Sj. If this number is zero in one step, this means that there does not exist 
such a fix-free code. But, if this number is greater than zero in each iteration, we have some 
choices in each iteration and, finally we reach a fix-free code. 

So, for string we count the number of strings like s[ with the same composition set of Sj 
such that no s'j (for j < i) is neither a prefix of s'; L nor a suffix of We can compute this 
number as follows: 



( ) - ^2 PrefixNum(sj, s'j) - ^ SuffixNum(si, s'j) 

1<j<* i<i<« 

+ PrefixSuffixNum(s i , s'j, s' k ) (3) 

l<j,k<i 

In above formula, PrefixNum(sj, s'j) is the number of strings like s[ with the same compo- 
sition set of Si such that s'j is its prefix. Similarly, SuffixNum is defined. We also define 
PrefixSuffixNum(sj, s'j, s' k ) to be the number of strings like with the same composition set 
of Si such that s'j is its prefix, and s' k is its suffix. Note that the above formula is basically the 
simplified version of Inclusion Exclusion Principle knowing the fact that there can not be three 
strings among s[, s' 2 , ■ ■ ■ , s-_ x such that each of them is either a prefix or a suffix of the same 
string. 

If this number is positive we know that there exists a string with the desired properties. 
But we have to find this string as well. This is done by searching in the binary search in the 
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tree of all strings. Here we show that we can find the lowest string (alphabetically) s£ with these 
properties. At first we try to find a string s\ that starts with zero. We count all strings s[ with 
the desired properties that also start with zero. This can be done by changing each term in the 
above formula by assuming that starts with zero. For example, instead of ( a ^ b ) we should 
write ( a+ ^ -1 )- If s'j starts with one, the number PrefixNum(sj, s'-) should be replaced with zero 
because we know that s[ is supposed to start with zero, and therefore s'- can not be its prefix. 
So, we change the above formula, accordingly. If the number of these strings is positive, we 
know that there exists an string s' { with the desired properties that also starts with zero. So, we 
fix the first digit to be zero, and go on to the next digit. We can iteratively continue this process 
till there are a ones and b zeros in our string. This can be done by computing the above formula 
a + b times (in each iteration we fix a digit). 

Our algorithm runs in polynomial time in terms of n and the total number of ones and zeros 
in all n input strings. ■ 

In Lemma [71 a polynomial time algorithm is provided to find a near optimal fix-free code 
when its maximum cost and the number of codewords are given. To the best of our knowledge, 
it is the first approximation algorithm for this problem. We assumed (without loss of generality) 
that the cost of a zero is 1 and the cost of a one is m > 1. 

Notice that in the case when the letter costs are equal, i.e. m = 1, it is known that ( 116*1) for 
each probability distribution P = (pi,P2,-" >Pn) there exists a fix-free code where the average 
cost of the codewords is bounded above by H(P) + 2, where H(P) = — J2i=iPi^°SPi 1S the 
entropy of the source. In the following lemma the objective is to minimize the average codeword 
cost (defined in (OQ)) for general letter costs with uniform distribution of the codewords. 

Lemma 7. For any given number x, if there exists a fix-free code such as S with n codewords 
and cost at most x, we can find a fix-free code in polynomial time with cost at most (5 + -^i)x. 

Proof: Let y be x/n. Note that y is the mean cost of the n codewords in S. So the number 
of codewords with cost more than 2y is less than n/2 and the number of codewords with cost 
at most 2y is at least n/2. Because if there are more than n/2 codewords in S with cost at least 
2y, the total cost of S would be more than n/2 x 2y = n x y = x which is a contradiction. Let 
A be the number of codewords in S with cost at most 2y. We conclude that A is at least n/2. 
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Name these A codewords si, s 2 , ■ ■ ■ ,s A . 

These codewords have at most I = [2y\ letters(including zeros and ones) and at most k = 
\2y/m\ ones (because zero has cost 1, and one has cost m). Let A be the number of codewords 
with at most / letters and k ones). 

Now we change these A codewords in the following way to get A new codewords that have 
the same size, and are also fix-free. 

If some of these codewords have less than / letters, we add some zeros to their ends in order 
to make all of them have the same length, /. So we add I — |sj| zeros at the end of s.j where 
| Si | is the length of Sj. Let s[ be the new codeword. Clearly we get A codewords s[, s' 2 , • • ■ ,s' A 
with the same size, I. We now prove that these A new codewords are different by contradiction. 

Assume two codewords s[ and s'j are the same. Without loss of generality, assume that \si\ > 
Sj. Since s[ is the same as s'p the codeword Sj is a prefix of Sj which is a contradiction. Because 
codewords s±, S2, • ■ ■ ,s n come from a fix-free code, so none of them can be a prefix of another. 
So the codewords s' l7 s' 2 , ■ ■ • , s' A are not equal to each other at all. 

Now we can get 2A codewords which form a fix-free code with some modifications as 
follows. For each codeword s[, add a zero at the end of s'^ and get the new codeword s' 0i . 
In the same way add a one at the end of s[, and get the new codeword s^. Now we have 
2A codewords s[ , s' 20 , ■ ■ ■ ,s' A0 , s' 1A , s' 21 , ■ ■ ■ ,s' A1 each of which has size I + 1. Since the A 
codewords s' 1; s' 2 , ■ ■ ■ , s' A are A different codewords, these 2 A codewords are also different, and 
have the same size, so none of them is a prefix or suffix of another one. 

Since 2A is at least n, we conclude that there exists n codewords with length / + 1 and at 
most k + 1 ones in each of the codewords. 

Let T be the set of all codewords with length I + 1 and at most k + 1 ones. We proved that 
there are at least n codewords in T. We just need to pick n arbitrary codewords from T (one 
can start from the codewords with one 1, and then two Is, and so on, and pick n codewords this 
way). Since all members of T have the same size and two different codewords with the same 
size can not be prefix or suffix of each other, the result of our algorithm would be fix free. 

Now we analyze the cost of the code we obtained. The cost of these n arbitrary codewords 
is at most [{k + l)m + (I - k)]n. The ratio of this cost to the optimal cost x is [(fc+1)m ^ (z ~ fc)]n = 
krrvn in + (m-fc)n <2 + 2 + 2Hi<4 + i + i_ = 5+1 Note that we defined / and k such 

x x x — x — (n— 1) n— 1 

that In < 2x, and kmn < 2x. We also know that there are at most one word in the optimal 
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fix-free code that does not have any one. So there are n — 1 codewords in optimal code that each 
of them has at least one 1. So the cost of optimum, (which is at most x), is at least (n — l)m 
and therefore — < 1 + — ^-j-. So we proved that the cost of our code is at most [5 + l/(n — l)]x. 

■ 

Note that when there does not exist a fix-free code with cost at most x, the algorithm in 
Lemma [7J may return a code with cost at most (5 + -zr)x or fail. 

Furthermore, it is useful to add that Lemma UJ fails if and only if the set T contains less than 
n codewords, and that, as x increases, the size of T does not decrease, therefore, if the algorithm 
is successful for some x, then it will be successful for all values larger than x. 

In the following theorem, we present an approximation algorithm that always finds a fix-free 
code such that its cost is at most 5 + — ^-j- + e times the cost of the optimal code. 

Theorem 8. For any n and e > 0, there is a 5 + -^r + ^-approximation algorithm for the 
problem of finding the optimal fix-free code with n codewords such that its time complexity is a 
polynomial of the n and -. 

Proof: Let y be the cost of the optimal fix-free code. If we know the value of y, we can 
find a fix-free code with cost at most {5 + -zj)y using Lemma|7J and the claim is true. Although 
y is not given as an input, we can guess the y by a typical binary search and with error e by 
guessing 0(\og(n(n + m)/e)) times. Actually we know that y is at least n. We also know that 
y is at most n{n — 1 + m) because there are exactly n codewords which have only one 1 and 
n — 1 zeros. These codewords form a fix-free code and the cost of this code is n{n — 1 + m). 
So we have n < y < n(n — 1 + m). Let x be the minimum number for which the algorithm 
in Lemma [7J returns a code with cost at most (5 + ^zj)x. We are going to find x with error e. 
We know that x < y and < x < n{n — 1 + m). we are going to run a binary search in the 
interval [0,n(n + m — 1)]. In each step, we can decrease the length of our interval to half of 
its previous length. For example, if we know that x is in we define z to be Next 

using Lemma [7i we can know that whether x < z or not, because if the algorithm in Lemma 
[7J fails, x is greater than z. Otherwise, x is at most z. So after each step we know that x is in 
[a, or Therefore the length of our searching interval is multiplied by ~ in each 

step, and after log(n(n + m)/e) steps the length of our interval is at most e. Because at first 

July 18, 2011 DRAFT 



13 



the length is less than n{n + m). Finally we know that x is in [t, t + e] where the algorithm in 
Lemma [7] does not fail for t + e. In the other words we can find a fix-free code with cost at most 
(5 + — rr)[i + e]. As we know t + e<x + e<y + e. We conclude that the fix-free code that we 
just found has a cost of at most (5 + -^h[)[t + e] < (5 + ~zi)[y + e](5 + + e)y because y is 
at least n. Therefore we found a fix-free code with cost at most (5 + + e) times the cost of 
the optimal code. 
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